Towards Computation, Space, and Data Efficiency in de novo DNA Assembly: A Novel Algorithmic Framework

نویسندگان

  • Ka Kit Lam
  • Nihar B. Shah
چکیده

We consider the problem of de novo DNA sequencing from shot gun data, wherein an underlying (unknown) DNA sequence is to be reconstructed from several short substrings of the sequence. We propose a de novo assembly algorithm which requires only the minimum amount of data and is efficient with respect to space and computation. We design the algorithm from an information theoretic perspective of using minimum amount of data. The key idea to achieve space and computational efficiency is to break the procedure into two phases, an online and an offline phase. We remark that this can serve as an evidence of the feasibility of using an information-theoretic perspective to guide practical algorithmic design in DNA sequencing. Preliminary work on extending this algorithmic framework to more realistic settings is also reported.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering of Short Read Sequences for de novo Transcriptome Assembly

Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...

متن کامل

Evolutionary algorithms and de novo peptide design

Automated de novo design of bioactive molecules is one of the aspired goals in computational chemistry. Despite significant progresses in computational approaches for ligand design and efficient evaluation of binding energy, novel procedures for ligand design are required. Evolutionary computation provides a new approach to this design issue. This paper proposes a framework for evolving ligands...

متن کامل

Finding Exact and Solo LTR-Retrotransposons in Biological Sequences Using SVM

Finding repetitive subsequences in genome is a challengeable problem in bioinformatics research area. A lot of approaches have been proposed to solve the problem, which could be divided to library base and de novo methods. The library base methods use predetermined repetitive genome’s subsequences, where library-less methods attempt to discover repetitive subsequences by analytical approach...

متن کامل

De Novo Ultrascale Atomistic Simulations On High-End Parallel Supercomputers

We present a de novo hierarchical simulation framework for first-principles based predictive simulations of materials and their validation on high-end parallel supercomputers and geographically distributed clusters. In this framework, highend chemically reactive and non-reactive molecular dynamics (MD) simulations explore a wide solution space to discover microscopic mechanisms that govern macr...

متن کامل

P-70: Evidence for Differential Gene Expression of A Major EpigeneticModifier Enzyme, de novo DNA Methyltransferase 3b, through Vitrification of Mouse Ovary Tissue

Background: Ovarian tissue cryopreservation is a feasible method to preserve female reproductive potential, especially in young patients with cancer or in women at risk of premature ovarian failure. Vitrification has recently emerged as a new trend for biological specimen preservation. On the other hand, gene expression that changes during vitrification can influence oocyte maturation and need ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013